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Abstract — In current scenario, low power consumption and high 
speed are some of the most important criteria for the fabrication 
of DSP systems and any high performance systems. Optimizing 
the speed and power of the multiplier is a major design issue. 
However, area and speed are usually conflicting constraints so 
that improving speed results mostly in larger areas. In our 
project we are trying to determine the best solution to this 
problem by comparing a few multipliers and choosing perfect 
multiplier for implementation of FIR filter. So in this paper 
designing a FIR filter, which is efficient not only in terms of 
delay and speed but also in terms of power. The simulations have 
been carried out using the Xilinx ISE tool. 

Index Terms —. DSP, FIR filter, Multiplier, Xilinx ISE 

I. Introduction 

The multiplier [l]-[3], [5] is one of the key hardware blocks in 
most of high performance systems such as digital signal 
processors and microprocessors [2]. With the fast advances in 
technology, many researchers are working on the most 
efficient multipliers [5]. They key requirement is not only 
higher speed and lower power consumption but also 
occupying reduced silicon area. This makes them well-suited 
for various complex and portable VLSI circuit [6] 
implementations. However, the reality is that the area and 
speed are two conflicting performance factors. Thus, 
increased speed always results in larger area. In this paper, we 
found a better trade-off between the two, by realizing a 
marginally decreased delay which increases the speed 
performance [3] through a small rise in area such that increase 
in the number of transistors [6]. The new design lowers the 
delay of the widely approved Wallace tree multiplier [7]. On 
the conventional multiplier, the structural optimization is 
performed, in such a way that the latency of the total circuit 
reduces significantly. The Wallace tree basically multiplies 
two unsigned integers [7]. In this project we compare the 
working & the characteristics of different multiplier [8] 
individually and then choosing the perfect multiplier by 
implementing each of them separately in FIR filter. 

The parallel multipliers like radix 2 and radix 4,Wallace 
multiplier[8] perform the computations using less number 
adders and thus have lesser iterative steps which results in 
requiring lesser space as compared to the serial multiplier. 
Here now we are comparing Booth and Wallace multiplier 
[13] [15] to find the efficient one. Area is a very important 
factor because in the fabrication of chips [1] and high 
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performance system, requires components which are as small 
as possible and covering low space. 

II. Fir Filter Theory 

Finite Impulse Response (FIR) filter are type of digital 
filter and consist of impulse response among non-recursive 
digital filters which is finite in length [3]. FIR filters are 
non-recursive digital filters such that the current output is 
calculated solely from the current and previous input values. 
FIR filter has been selected for this thesis due to their good 
characteristics.[3] [9] FIR filter has no feedback and its 
input-output relation is given by 

Here, x [n] and y [n] are the filter input and filter output 
respectively, a [k] is the filter coefficients, N is the filter 
coefficient number. The 27 denotes summation from k = 0 to k 
= n where n is the number of feed forward taps in the FIR 
filter. Transfer function of FIR filter can be represented as [3] 
[9]: 

H < z > = vS = h[n]. (2) 

The frequency response realized in the time domain is of 
more concern for FIR filter realization (both hardware and 
software). The transfer function can be calculated via the 
z-transform of a FIR filter frequency response [9]. 


m. Digital Adders 

In digital electronics, adder is a type of digital circuit that 
performs addition of two numbers. As described in [10], many 
computers and other kinds of processors, adders are common 
not only in the ALU(s), but also in other parts of the 
processor, where they calculate addresses, table indices, and 
many more. 

A. Ripple Carry Adder 

A ripple carry adder is a digital circuit that produces the 
arithmetic sum of two binary numbers. Full adders [12] are 
cascaded to construct ripple carry adder, with the carry output 
from each full adder linked to the carry input of the next full 
adder in the chain. As shown in Figure 1 the interconnection 
of four full adder (FA) circuits to provide a 4-bit ripple carry 
adder [9] [12]. It can be seen from Figure the input is coming 
from the right side because the first cell traditionally 
represents the least significant bit (LSB). Bits a 0 and b 0 in the 
figure represent the least significant bits of the numbers to be 
added. The s 0 -s 3 expressing the sum output. 
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Figure 1 Ripple Carry Adder 


B. Carry Look Ahead Adder 

Carry- Look Ahead Adder (CLA) is designed to eliminate 
the latency introduced by the repelling effect of the carry bits 
in RCA. The CLA improves speed by reducing the amount of 
time required to determine carry bits [12]. The concepts of 
generating (G) and propagating (P) carries is used in CLA. 
Two signals called P and G are used in this adder for each bit 
position [9] [12]. The P and G are shown below. 

C i+1 = Gi + Pi. Ci (3) 


Here, Gi = Ai. Eh and Pi = (Ai xor Eh) 

Si = Ai xor Eh xor Ci = Pi xor C i. 

The Si and C i+ i represent the sum and carry for i th full adder 
respectively. 


A3/B3 A1/B1 



Figure 3 Carry Select Adder 


D. Carry Save Adder 

A Carry Save Adder (CSA) is type of digital adder[9][10] 
which generates low carry signal propagation delay, but in 
place of adding two input numbers to a single sum output, it 
adds three input numbers to an output pair of numbers. When 
its two outputs are then summed by using a carry-look a head 
or ripple-carry adder[12], we obtain the sum of all three 
inputs. 


IV. Digital MULTIPLIERS 


A 0 



Figure 2 Carry Look ahead Adder 


The carry-look ahead adder can be splited in two modules: 

(1) The Partial Full Adder , PFA, which generates S i? Pi and 

Gi. 

(2) The Carry Look-Ahead Logic, which generates the 
carry-out bits. 


C. Carry Select Adder 

A Carry Select Adder employ a logic element that evaluate 
the (n+1) bit addition of two n-bit numbers. The carry select 
adder [10] [12] usually includes two ripple carry adder and a 
multiplexer. With a carry select adder sum of two n-bit 
numbers is done by using two adders (therefore two ripple 
carry adders) in order to perform the adding up twice, one tie 
with the appropriation of the carry existence zero and the 
other assuming one. 


A Binary multiplier is an electronic digital hardware device 
used in digital electronics or a computer or other electronic 
device to perform rapid multiplication of two numbers in 
binary representation [5] [11]. It is built using binary adders 
[10]. Multiplier plays an important role in today’s digital 
signal processing and various other applications [2]. In high 
performance systems such as microprocessor, DSP etc. 
addition and multiplication of two binary numbers is essential 
and most often used in arithmetic operations. Statics shows 
that addition and multiplication is performed in almost 70% 
instructions in microprocessor and most of DSP algorithms 
perform [2]. 


A. Array Multiplier 


I multiplicand 



Array struct u re 

■ ^ ^ i—. 

Level 1 

«: om or * zi-o r 

: Level 2 

1 C4fftpr*fS!»r K 

f 7Fr 

Level 3 

gem pressor 

i LC 

r— 1 

Lcvc4 

1 -t irmpT* U49H 1 

IT 



r A \ 


RESULT 


] 


Figure 4 Array Multiplier 
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Array multiplier is well recognized because of its regular 
structure. Multiplication process is based on repeated 
addition and shifting procedure. Each partial product is 
generated by the multiplication of the multiplicand with one 
multiplier digit then these partial product are shifted 
according to their bit sequences and then added [9] [13]. The 
summation can be performed with normal carry propagation 
adder. Total N-l adders are required where N is the no. of 
multiplier bits. The n-operand array consists of n-2 
compressor. 

B. Radix-2 Booth Multiplier 

This is technique that allows for smaller, faster 
multiplication circuits by recoding the numbers that are 
multiplied. Partial products is reduced by factor 2which 
implies that it allows only half of product which is needed 
during computation. The Booth’s algorithm is for multiplying 
binary signed number in 2’s complement [9] [13]-[14]. Let R 
and M are the multiplicand and multiplier respectively; and 
let n and q represent the number of bits in R and M. Take the 
2’s complement of R which is given as -R. For calculation, 
make the table of U, V, X and X-l variable, respectively. 
Stepl: 

a) Fill M value in the table. 

b) Fill 0 for M-l value it should be the previous first least 
significant bit of M. 

c) Fill 0 in U and V rows which show the product of M and X 
at the end of multiplication operation. 

d) Take n rows for every cycle; this is because we are 
multiplying n bits numbers. 

table 


Load the value 
1 st cycle 
2 nd cycle 
3 rd Cycle 
4 th Cycle 


Table 1 Making of Booth 


u 

V 

X 

X-l 

0000 

0000 

1100 

0 


















Step2: Booth algorithm requires evaluation of the multiplier 
bits, and shifting of the partial product. Use the first least 
significant bits of the multiplier “M”, and the previous least 
significant bits of the multiplier “M - 1” to determine the 
arithmetic action. 


Table 2 Shift in Booth table 



1. If they are 00, no change. 

2. If they are 11, no change. 

3. If they are 01, add X+A. 

4. If they are 10, add (-X)+A. 


Table 3Partial Products generation 


u 

V 

X 

X-l 

0000 

0000 

1100 

0 

0000 

0000 

Olio - 

'0 

0000 

0000 

0011 

0 










Table 4 Final Shift 


Shift only 


u 

V 

X 

X-l 

0000 

0000 

1100 

0 

0000 

0000 

0110 

0 

0000 

0000 

0011 

0 

1110 

0000 

0011 

0 

mi 

0000 

1001 

"1 

mi 

1000 

1100 

1 


Shift only 


Arithmetically shift the value calculate in step 1-4 by signal 
place of right. 

b) Take U & V together and arithmetically right shift which 
store the sign bit of 2’s complement number. Hence a positive 
number and a negative number remains unchanged. 

c) Right shift circulate M due to this not use of two for the M 
value. 

d) Repeat the same steps until the n cycles are completed. So 
the answer is shown, in the last rows of U and V. 


C. Radix-4 Booth Multiplier 

The shortcomings of Radix-2 can get overcome by Radix-4 
[13] [14] [15] in which it handle more than one bit of 
multiplier in each cycle. The modified Booth's algorithm 
starts by appending a zero to right of LSB of multiplier. This 
recoding scheme applied to a parallel multiplier halves the no. 
of partial products so the multiplication time & hardware 
requirement can get reduce [8] [14]. 

Radix-4 Booth algorithm examines strings of three bits 
according to the following algorithm given below[14]: 

a) Increase the sign bit 1 position if required to verify that n is 
even. 

b) The right side of the LSB of the multiplier adds with 0. 

c) As per the value of all vectors, all Partial Product will be 0, 
+y, -y, +2y or -2y. 

The values of y are comes negative due to taking the 2’s 
complement. The multiplication of y performs by left shifting 
y by one bit. As a result implementing of n-bit parallel 
multipliers, only n/2 partial products are created[9] [14]. 

D. Wallace Tree Multiplier 

A Wallace tree multiplier[7][15] is an proficient hardware 
implementation of a digital circuit that multiplies two 
integers. Number of partial products gets reduced and for the 
addition of partial products uses carry select adder. 

Wallace tree is known for their good computation time, 
when adding multiple operands to two outputs using 3:2 or 
4:2 compressors or both. Wallace tree ensures the lowest 
whole delay [15] 
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Figure 5 Structure of Wallace Multiplier 

V. Results 


After analyzing both the multipliers, and compare their 
characteristics in terms of multiplication speed, no of 
computations required, no of hardware, we come on finding 
that Wallace multipliers is much better than Booth 
multipliers. By implementing both Radix-2 & Radix -4 and 
Wallace multiplier and we analysis that their computation 
speed of Wallace multiplier is faster. 

In this project these multipliers are implemented with FIR 
filters to compare some characteristics like the speed, power 
consumption, computations, hardware requirement of the 
system. Coding of all the multipliers have done separately in 
VHDL & simulate it to get the accurate waveforms as output. 
Then we implement these multipliers separately with FIR 
filters using computation techniques like FFT, DFT. These 
coding also written in VHDL language & simulate it to get the 
RTL circuit of each system. Also get the lookup table , where 
we get the exact no of i/p, o/p/ no of slices requirement etc for 
the system. Xilinx Estimator analysis these simulated results 
& determine the power consumption of each system. These 
results are given below. 


Table 5 Area & Delay of Radix -4 Booth multiplier 


No. of slices 

204 

No. of 4-input LUTs 

391 

No.of bonded I/Os 

65 

Delay(ns) 

6.177 


Table 6 Area & Delay of Wallace Multiplier 


No. of slices 

9 

No. of 4-input LUTs 

16 

No.of bonded I/Os 

16 

Delay(ns) 

5.895 


Table 7 FIR using Different multipliers 


Type of 
multiplier 

No.of 

Slices 

No. of 
4-input 
LUTs 

No. 

of 

bonde 

d I/Os 

No. of 

slice 

FFs 

Dela 

y 

Radix-4 

157 

297 

26 

45 

15.4 

Booth 





4ns 

Multiplier 







Wallace 

27 

47 

21 

23 

8.51 

Multiplier 





Ins 


VI. Conclusion 

This paper is the clear model of different multiplier and their 
implementation in tap delay FIR filter. We found that the 
Wallace multipliers are much option than the serial multiplier. 
We concluded this from the result of delay and the total area. 
In case of Wallace multipliers, the total area is much less than 
that of boothl multipliers. Hence the power consumption is 
also less. This is clearly depicted in our results. This speeds 
up the calculation and makes the system faster. While 
comparing the radix 2 and the radix 4 booth multipliers we 
found that radix 4 consumes lesser power than that of radix 2. 
we found that Wallace multiplication method is better than 
other multipliers in terms of speed, area and power. So by 
using Wallace multiplier we can achieve the fast and efficient 
multiplication. 


VII. Future Work 

One possible direction is to increase the number of bit of 
multiplier. We have only considered 8 bit for encoding as it is 
a simple and popular choice. Higher number of bits recoding 
further reduces the number of LUT's and thus has the potential 
of reducing the area. 
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